Accelerating Spherical k-Means
نویسندگان
چکیده
Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors. While several improvements accelerations have been introduced the original algorithm, not all easily translate to spherical variant: Many acceleration techniques, algorithms of Elkan Hamerly, rely on triangle inequality Euclidean distances. However, uses Cosine similarities instead distances computational efficiency. In this paper, we incorporate Hamerly working directly with Cosines obtain substantial speedup evaluate these real data.
منابع مشابه
Title Spherical K-means Clustering
October 21, 2009 Type Package Title Spherical k-Means Clustering Version 0.1-2 Author Kurt Hornik, Ingo Feinerer, Martin Kober Maintainer Kurt Hornik Description Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a simple fixed-point algorithm and an interface to the CLUTO vcluster program. License GPL-2 Imports slam...
متن کاملAccelerating Lloyd’s Algorithm for k-Means Clustering
The k-means clustering algorithm, a staple of data mining and unsupervised learning, is popular because it is simple to implement, fast, easily parallelized, and offers intuitive results. Lloyd’s algorithm is the standard batch, hill-climbing approach for minimizing the k-means optimization criterion. It spends a vast majority of its time computing distances between each of the k cluster center...
متن کاملPackage 'skmeans' Title Spherical K-means Clustering
Description Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a fixed-point algorithm and an interface to the CLUTO vcluster program.
متن کاملK-Means for Spherical Clusters with Large Variance in Sizes
Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-89657-7_17